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ABSTRACT 

We present FLO (From Lines to Over-densities), a new technique to reconstruct the 
hydrogen density field for the Lya forest lines observed in high resolution QSO spec- 
tra. The method is based on the hypothesis that the Lya lines arise in the low to 
intermediate density intergalactic gas and that the Jeans length is the typical size 
of the Lya absorbers. The reliability of FLO is tested against mock spectra obtained 
from cosmological simulations. The recovering algorithm gives satisfactory results in 
the range from the mean density to over-densities of ~ 30 and reproduces correctly the 
correlation function of the density field and the ID power spectrum on scales between 
~ 20 and 60 comoving Mpc. A sample of Lya forests from 22 high resolution QSO 
spectra is analysed, covering the redshift range 1.7 < z < 3.5. For each line of sight, 
we fit Voigt profiles to the lines of the Lya forest, providing the largest, homogeneous 
sample of fitted Lya lines ever studied. The line number density evolution with red- 
shift follows a power-law relation: dn/dz = (166 ± 4) [(1 + z)/3.5]( 2 ' 8±0 ' 2 ) (1 a errors). 
The two-point correlation function of lines shows a signal up to separations of ~ 2 
comoving Mpc; weak lines (logA r (Hi) < 13.8) also show a significant clustering but 
on smaller scales (r < 1.5 comoving Mpc). We estimate with FLO the hydrogen den- 
sity field toward the 22 observed lines of sight. The redshift distribution of the average 
densities computed for each QSO is consistent with the cosmic mean hydrogen density 
in the analysed redshift range. The two-point correlation function and the ID power 
spectrum of the 5 field are estimated. They are both consistent with the analogous 
results computed from hydro-simulated spectra obtained in the framework of the con- 
cordance cosmological model. The correlation function shows clustering signal up to 
~ 4 comoving Mpc. 

Key words: intergalactic medium, quasars: absorption lines, cosmology:observations, 
large-scale structure of Universe 
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servatory Very Large Telescope, Cerro Paranal, Chile - Programs 
166.A-0106(A) and during commissioning and science verification 
of UVES 



of the low density intergalactic medium (IGM) that trace 
the underlying matter density field over cosmic time. The 
dynamical state of the low density IGM is governed mainly 
by the Hubble expansion and by gravitational instabilities. 
As a consequence, the physics involved is quite simple and 
mildly non-linear. The statistical analysis of the Lya forest 
provides information on the dynamical growth and thermal 
state of the IGM, and on the correlation properties of the 
(dark) matter in the Universe. Correlations of the Lya for- 
est lines were detected with a 4 — 5 a confidence by various 
authors at typical scales Av < 350 kms~ observing at 
high resolution individual lines of sight (ICristiani et al.l 
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^ 2). This velocity range 
corresponds to scales < 2.5 h~ x Mpc (assumin g negligible 
pecu liar velocities). The "cosmic web" scenario ijBond et all 
1 19961 ) is favoured against that of a p opulation of pressure 
confined clouds (|Sargent et al.l Il980j ) thanks also to the 
analysis of the line correlation observed in close pairs of 
QSO lines of sight, implying absorber sizes of a few hundred 
kpc (e.g.. ISmette et atl Il992l. Il995l; iBechtold et all jl994 ; 
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Dinshaw et al.lll997l; ICrotts fc Fang||l99Sl; 
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200ll ; lYoung et al.ll200ll ; lBecker et alj|2004l ). The analysis of 
multiple lines of sight at slightly larger separations (smaller 
than a few arcminutes), makes it possible to compute 
the transverse correlation function for which a clustering 
signal is detected up to velocity separati ons of ~ 200 
kms" 1 , or about 3 fe -1 comoving Mpc ( Rollinde et al.l 
120031 ; iD'Odorico et~ai1l2006l ; ICoppolani et alj|2006h . 

Traditionally, absorption spectra were decomposed into 
Voigt profiles which were then identified with individual 
discrete absorption systems. Information on the physical 
state of the gas originating the absorptions comes di- 
rectly from the fit parameters: redshift, column density and 
Doppler broadening (linked to the temperature) . In the new 
paradigm the emphasis of the analysis has shifted to statis- 
tical measures of the transmitted flux (e.g. the flux power 
spectrum) more suitable for absorption arising from a con- 
tinuous density field. However the interpretation of statisti- 
cal quantities of the continuous flux field and their relation 
with the physical properties of the gas requires a non-trivial 
comparison with full hydro-dynamical high-resolution sim- 
ulations that are computationally expensive. 

The aim of this paper is to extend the line fitting ap- 
proach by identifying a new statistical estimator linked to 
the physical properties of the underlying IGM. This new es- 
timator will also overcome the two main drawbacks of the 
Voigt fitting method: 

(i) the subjectivity of the decomposition into components: 
the same absorption can be resolved by different scientists 
(or software tools) in different ways, both in the number of 
components, and in the values of the output parameters for 
a single component; 

(ii) the blanketing effect of weak lines: they can be hidden 
by the stronger lines, so that their exact number density 
is unknown and has to be inferred from statistical argu- 
ments. Unfortunately, since the weak lines are also the most 
numerous, the uncertainty in their exact number is trans- 
formed into a systematic error of the computed statistical 
quantities. 

This new estimator is identified in the hydrogen den- 
sity field, 71h, which is linked to the measu red Hi column 
densities through the formula (Schayc 2001): 
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where, 8 = %/ («h) — 1 is the density contrast, To, 4 = 
To/10 4 is the temperature at the mean density, Ti2 = 
r/10 -12 is the H photo-ionisation rate, f g w fib/fim is 
the fraction of the mass in gas and a depends on the ion- 
isation history of the Universe. Equation [1] relies on three 
main hypotheses: (i) Lya absorbers are close to local hydro- 
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Table 1. Summary of the main characteristics of our QSO sam- 
ple, (see text). 



static equilibrium, i.e. their characteristic size will be typ- 
ically of the order of the local Jeans length (Tj); (ii) the 
gas is in photo-ionisation equilibrium; (iii) the equation of 
state, T = Tp (5 + 1) " holds for the optically thin IGM gas 
l|Hui fc Gnedinlll997r i. 

The procedure to recover the H density field from the 
list of Lya line column densities in a QSO line of sight, has 
been dubbed FLO (From Lines to Over-densities). 

The paper is organised as follows: Section 2 describes 
the observed data sample used for our analysis and presents 
the statistical measures obtained for the fitted Lya lines; 
Section 3 introduces the hydrogen density field as a statis- 
tical estimator, and describes the construction algorithm; 
Section 4 presents the simulated spectra and the test of reli- 
ability of the method with this dataset; in Section 5 the new 
algorithm is applied to the observed data sample; finally, we 
draw our conclusions in Section 6. 

The cosmological model adopted throughout this paper 
corresponds to a 'fiducial' ACDM Universe with parameters, 
at z = 0, fi m = 0.26, Qa = 0.74, D. h = 0.0463, n 3 = 
0.95, a 8 = 0.8 5 and Hp = 72 km s" 1 Mpc" 1 (the B2 set of 
parameters of IViel. HaehneltT Springelll2004T ). 



2 OBSERVED DATA SAMPLE 

Most of the observational data used in this work were ob- 
tained with the UVES spectrograph l|Dekker et al.ll2000l ) at 
the Kueyen unit of the ESO VLT (Cerro Paranal, Chile) 
in the framework of the ESO L arge Programme (LP ): "The 
Cosmic Evolution of the IGM" (|Bergeron et al.ll2004h . Spec- 
tra of 18 QSOs were obtained in service mode with the aim 
of studying the physics of the IGM in the redshift range 1.7- 
3.5. The spectra have a resolution R ~ 45000 and a typical 
signal to noise ratio (SNR) of ~ 35 and 70 per pixel at 3500 
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Figure 1. Lyo forest redshift coverage of the QSOs in our sample. 
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Figure 2. Portion of the Lya forest for an observed (lower panel) 
and a simulated (upper panel) line of sight in our sample. 



and 6000 A . respectively . Detai l s of the data reduct ion can 
be found m idland et"aU (|2004l ); lAracil et all l|2004 ). 
We added to the main sample 4 more QSO spectra with 
comparable r esolution and SNR: 

- J2233-606 (|Cristiani fc D'Odoricol 120001 ). Data for this 
QSO were acquired during the commissioning of UVES in 
October 1999. 

- HE1122-1648 (|Kim et al.ll2002T l. Data for this QSO were 
acquired during the science verification of UVES in February 
2000. The reduced and fitted spectrum was kindly provided 
to us by Tae-Sun Kim. 

- HS1946+7658 jKirkman fc Tvtlerl 119971 ) . Data for this 
QSO were acq uired with Keck/H IRES in July 1994. 

- B1422+231 l|Rauch et alj|l996f ). Data for this QSO were 
acquired with Keck/HIRES in 1996. The reduced and fitted 
spectrum was kindly provided to us by Tae-Sun Kim. 

Table [TJ summarises the main properties of our QSO 
sample. None of our QSOs is a Broad Absorption Line 
(BAL) object. Magnitudes are taken from the GSC-II cata- 
logue (|McLean et alJl200Ch . Figure [T] shows the distribution 
in redshift of the Lya forests for all the QSOs of the sample. 
We considered for each QSO the redshift range between 1000 
kms -1 red- ward of the Ly/3 emission, in order to avoid con- 
tamination by associated Ly/3 lines, and 5000 kms" 1 blue- 
ward of the Lya emission to exclude the region affected by 
the proximity effect due to the ionising flux of the QSO. The 
coverage is good over the whole redshift range z ~ 1.7 — 3.5, 
with most of the signal concentrated between z ~ 2 and 2.5. 
In Fig. [5] we show a portion of the Lya forest of the QSO 
HE0001-2340 compared with the same wavelength region in 
a mock spectrum extracted from the considered simulation 
box at z = 2 (see section Q. 

2.1 Creation of the line lists 

All the lines in the Lya regions of the LP QSOs 
plus J2233-606 were fitted with the FITLYMAN tool 



IIFontana fc Ball estedll995h of the ESO MIDAS data reduc- 
tion packagfl In the case of complex saturated lines we used 
the minimum number of components to reach \ 2 ^ 1-5. 
Whenever possible, the other lines in the Lyman series 
were used to constrain the fit. The spectra of HE1122-1648, 
HS 1946+7658, B1422+231 and all the simulated lines of 
sight were fitted with the VPFITp package. Both software 
tools model absorption features with a Voigt profile con- 
volved with the instrument line spread function. The min- 
imum Hi column density detectable at 3er, at the lowest 
SNR of the spectra in our sample, is log A r (Hi) ~ 12 cm -2 . 

Metals in the forest were identified and the correspond- 
ing spectral regions were masked to avoid effects of line blan- 
keting. We eliminated Lya lines with Doppler parameters 
b 10 km s _1 , that are likely unidentified metal absorp- 
tions. In a total amount of 8435 fitted Lya lines, 368 (4.4%) 
fall in the masked intervals, 1150 (13.6%) are at less than 
1000 kms -1 red- ward the Ly/3 emission or at less than 5000 
kms - blue- ward the Lya emission, while 599 were elimi- 
nated because they have b ^ 10 (7.1%). The output of this 
analysis is a list of Lya lines for each QSO with central 
redshift, H I column density and Doppler parameter. 

In the line fitting approach to the study of the Lya for- 
est, each line is considered as the signature of an absorber. 
As a consequence statistical measures are computed with the 
population of absorption lines, representative of the popu- 
lation of absorbers. Our sample of fitted Lya lines is the 
largest, homogeneous sample ever gathered up to now. We 
will use it to compute the number density evolution with 
redshift and the two point correlation function of lines. 
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Figure 3. Number density evolution of the Lya forest lines over 
the column density range 13.64 < logTV(Hl) < 17 cm" 2 for the 
22 QSOs in our sample (open triangles). The solid line traces the 
best linear fit obtained for those data (see text). For comparison, 
we report also previous measurements at high redshift and the 
result of the low redshift HST campaign. 




2.2 Line Number density evolution 

The line number density per unit of redshift is generally ap- 
proximated as dn/dz = (dn/dz)o(l + z) 13 , where (dn/dz)o 
is the local comoving line number density of the forest and 
the exponent /3 depends both on physical (redshift, column 
density interval) and instrumental (spectral resolution, de- 
composition of velocity profiles) factors. 

In Fig. [3] we plot the result for the QSOs in our 
sample for the standard column density interval 13.64 < 
log AT (Hi) < 17 cm" 2 in order to compa re our statistics 
with the HST low redshift measurement^ (jWevmann et al.l 
1998). The best fit to our data gives: dn/dz = (166 ± 
4) [(1 + z)/3.5] 2 ' 8±0 - 2 (la errors). There is no substantial 
change in the trend with respect to previous results by Kim 
and collaborators (2001, 2002) who used smaller samples 
of UVES QSO spectra of the same quality. However, our 
points are systematically higher on the plot, with an in- 
crease in log dn/dz amounting to ~ 0.03 at z ~ 2 up to 
~ 0.1 at z ~ 3. The discrepancy arises from the fact that we 
have taken into account the decrease in the available red- 
shift interval due to the presence of metal lines 'masking' 
the Lya features. High resolution spectra allow to identify 
a larger number of metal lines: in our sample these metal 
masks correspond to about 9 percent of the total redshift 
interval covered by the observed Lya forests. 

Fig. [4] shows the number density evolution for two dif- 
ferent Hi column density ranges: 13 ^ logTV(Hl) ^ 14 and 
14.5 < logiV(Hl) < 17 cm" 2 . The linear fit in these in- 
tervals gives slopes of 1.9 ± 0.2 and 3.8 ± 0.4 for the weak 
and the strong line s selection, respe ctively. This trend was 
already noticed by iKim et al.l (|2002l ): stronger lines have a 
steeper number density evolution than the weaker ones. 



2.3 Two-point correlation function of Lya lines 

To study the clustering properties of our sample of Lya lines, 
we adopt the standard two point correlation function 
(TPCF) defined as the excess, due to clustering, of the prob- 
ability dP of finding a Lya absorber in a volume dV at a 
distance r from another absorber: dP = <&L yci (z)dV[l+t;(r)}, 
where 5>(z) is the average space density of the absorbers as 
a function of z. 

Operative ly this quantity is estimated with the formula 
i|Peeblesill980l ): 



iVexp('u) 



(2) 



0.65 



where iV ba is the observed number of line pairs with velocity 
separations between v and v + dv, and iVexp is the number 
of pairs expected in the same range of separations from a 
random distribution in redshift. Sin ce in this context p ecu- 
liar velocities are negligible (see e.g. iRauch et al1l2005l ). we 
compute the correlation function in real space, measuring 



Figure 4. Number density evolution of the Lya forest lines over 
the two column density ranges 13 SC log N(H i) sC 14 (crosses) 
and 14.5 < logJV(Hl) < 17 cm -2 (open triangles) for the 22 
QSOs in our sample. The lines are the best linear fits for the two 
distributions (see text). 



1 http://www.eso.org/midas 

2 http:/ /www. ast.cam.ac.uk/~rfc/vpfit. html 

3 The lower limit in column density is due to the fact that HST 
measurements have been transformed from equivalent width into 
H I column densities assuming a typical Doppler parameter of 30 
km s — 1 
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Figure 5. Two point correlation function for the observed 
hya lines in the column density range 12 < logA r (Hl) < 
17 cm -2 . In the bottom panel lines closer than one Jeans length 
have been merged into one line, see text. The dashed lines repre- 
sent the 1 a confidence levels from a random distribution of lines. 
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Figure 6. Two point correlation function for the observed 
hya lines in two column density ranges as reported in the panels. 



Figure 7. Two point correlation function for the observed 
hya lines in the column density range 13.8 < log7V(RT)< 17 
and in two redshift ranges reported in the panels. 



separations in comoving Mpc. At the characteristic redshift 
of our sample, z — 2.5, a velocity separation Av — 100 
kms -1 corresponds to Ar ~ 0.9 comoving Mpc, in our fidu- 
cial cosmology. -/V exp is obtained by averaging the results of 
1000 numerical simulations of the number of lines observed 
in each QSO spectrum. In particular, the set of line red- 
shifts is randomly generated in the same redshift interval as 
the data according to the observed distribution oc (1 + z) 13 , 
where we adopt the value /3 — 2.8 found in the previous 
section. The same mock line lists are used to estimate the 
error on the observed correlation function by determining 
the 1 a standard deviation of the correlation functions of 
the randomly distributed lines. Lines closer than 0.3 comov- 
ing Mpc, are merged into a single line with redshift equal to 
the mean redshift, weighted with the column densities, and 
column density equal to the sum of the column densities. 
The minimal separation is set by th e intrinsic blending du e 
to the tipical width of the lines (see iGiallongo et al.l l 1996). 

We compute the correlation function for the whole data 
set (Fig. [5} and for two column density cuts (Fig. H} to 
investigate the cluster ing properties of strong and weak 
lines. Previous result s (ICristiani et al.|[l995l ; iLu et al. I ll996l ; 
ICristiani et al.l Fl997t) already showed a significant cluster- 
ing signal for strong absorptions, which is confirmed and 
strengthened by our data. Furthermore, we also see a signif- 
icant clus tering for the weak l ines, consistent with previous 
results bv lMisawa et al.ll|2004 ). The amplitude is about one 
order of magnitude lower than for the stronger lines but the 
clustering signal in the first bin is significant at the 7 a level. 

As already said in Section 1, the Jeans length (Lj) likely 
represents the typical size of IGM structures detected as 
Lya absorptions. This length (varying from ~ 1.2 to 1.6 co- 
moving Mpc for the maximum and minimum redshift of our 
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Figure 8. Two point correlation function for the observed 
Lya lines in the high-redshift range. Here the cut in column den- 
sity is denned to correspond to a constant cut in density contrast, 
8 > 3, corresponding to logN(Hi)> 14.3 at z = 3.02, see text. 



sample, respectively) is also comparable with the clustering 
scale of the Lya lines as shown in Figs. [5] (upper panel) and 
[U In order to verify that the clustering signal we are detect- 
ing is not only due to structures internal to the absorbers, we 
perform the following test. Lines with separation less than 
the local Lj are merged into a single line with column density 
equal to the sum of the column densities of the component 
features and redshift equal to the N(H l)-weighted mean of 
the component redshifts and the TPCF is re-computed. The 
result, reported in the lower panel of Fig. [5] shows that the 
clustering signal is preserved substantially at the same level 
of the one computed with all the lines, with a slightly de- 
creased significance due to the smaller statistics. This is an 
indication that Lya absorbers cluster among themselves and 
not only inside themselves. 

The present data set is large enough to allow studying 
the evolution of the correlation function with redshift. We 
consider the column density range for which the signal is 
stronger, 13.8 < logiV(Hl) < 17, and we divide our sample 
in two parts. The first sub-sample is formed by objects with 
emission redshift z om ^2.5, for which the average Lya forest 
redshift is (zLy Q ) = 2.07, and the second sub-sample has 
objects with z em > 2.5 and (zLy Q ) = 3.02. Results are shown 
in Fig. [7] the high redshift lines are less clustered than the 
low redshift lines. This apparent evolution with redshift is 
biased by the fact that the relation 5 — logiV(Hl) is also 
z-dependent. Indeed, the same column density range selects 
objects with a lower density contrast at higher redshift (see 
eq. [1} explaining the lower clustering signal. To verify this 
effect, we selected lines on the ground of a constant density 
contrast, 8 > 3, which corresponds to logiV(Hl) > 13.8 
at the average redshift of the low redshift sub-sample, and 



to logiV(Hl) > 14.3 at the higher average redshift. The 
correlation function for the latter sub-sample is shown in 
Fig. [5] Selecting the same kind of structures, there is no 
longer evidence of a significant evolution with redshift. 

Tab[5] shows a detailed budget of the number of lines 
used to compute the TPCF in all the different selections 
described above. 



3 INTRODUCING FLO 

In Section 1 we have described: on the one hand, what are 
the main drawbacks of the two standard approaches (Voigt 
fitting and flux statistics) adopted to analyse the Lya for- 
est and derive statistical quantities describing the physical 
state of the IGM. On the other hand, the recovered H den- 
sity field is introduced as a new robust estimator, whose 
statistical properties are in good agreement with those of 
the original density field, and which allows an easy compari- 
son between observation and simulation results. The relation 
between the underlying H density field and the Hi column 
densities measured for the observed absorption lines is sum- 
marised by eq.[T] 

Before describing the FLO technique in details, it is im- 
portant to recall the main hypotheses: 

1. Lya absorbers have typical sizes of the order of the l ocal 
L.j, which can be approximated as (Zar oubi et al.|[2006t ): 



1.33 



0.135 



-1/2 



T, 



o 



1.6 



(a+1) 



1/2 



X 10 4 

-1/2 



1/2 



(3) 



Mpc 



in comoving units, where h = f/o/100 km s _1 Mpc -1 , and 
the other parameters have been already defined; 
2. the IGM gas is in the linear or sli ghtly non-linear regime 
(log(<5 + 1) ~ 1, iHui fc GnedjnHl997n 

In order to apply eq.[T]we have, first of all, to go through 
the Voigt fitting process of the Lya forest absorptions in a 
QSO spectrum. Then, to transform the list of Hi column 
densities of Lya lines into the matter density field which 
generated them, we have to perform the following steps: 
(1) group Lya lines into absorbers of size of 1 Lj with col- 
umn density equal to the sum of column densities and red- 
shift equal to the weighted average of redshifts, using col- 
umn densities as weights. The absorbers are created with a 
friend-of-friend algorithm: 

(i) the spatial separation between all the possible line 
pairs is computed and the minimum separation is compared 
with Lj, computed at the iV(H l)-weighted redshift mean of 
the pair; 

(ii) if the two lines of the pair are more distant than the 
local Lj , they are classified as two different absorbers, stored 
and deleted from the line list; 

(iii) if the two lines are closer than the local Lj, they are 
replaced in the line list by one line with a redshift equal to 
the N(H l)-weighted mean of the two redshifts and a column 
density equal to the sum of the two column densities; 

(iv) the procedure is iterated until all the lines are con- 
verted into absorbers. 
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Table 2. Detailed budget of the number of lines used to compute TPCFs. The first column refers to the selection carried out, in terms 
of emission redshift of the considered objects. 



Selection 



nqso 



N b , 

mask 



N c N d 
b prox 



Lya lines 
N° 



- 1 ' mcrg 



N g , 

col 



fin 



all QSOs 

13.8 < logAf(Hi) < 17 
Ar > 0.3 com. Mpc 



22 



8435 368 644 1150 1955 380 4953 1147 



all QSOs 

12 < logAf(Hi) < 13.8 
Ar > 0.3 com. Mpc 



22 



8435 368 644 1150 1955 380 1319 4781 



all QSOs 

12 < logJV(Hi) < 17 
Ar > 0.3 com. Mpc 



22 



8435 368 644 1150 1955 380 170 5930 



^em ^2.5 

13.8 < logA^Hi) < 17 
Ar > 0.3 com. Mpc 



11 



3188 103 169 445 665 100 2042 381 



■2cm ^ > 2.5 

13.8 < logJV(Hl) < 17 
Ar > 0.3 com. Mpc 



11 



5247 265 475 705 1290 280 2911 766 



■2cm ^ > 2.5 

14.3 < logJV(Hl) < 17 
Ar > 0.3 com. Mpc 



11 



5247 265 475 705 1290 280 3345 332 



all QSOs 

12 < log AT (Hi) < 17 
Ar > lLj 



22 



8435 368 644 1150 1955 2944 



3472 



a total number of fitted Lya lines; b number of Lya lines falling in the metal masks; c number of Lya lines with b < 10 or b > 100; d 
number of lines falling closer than 1000 kms -1 red-wards the Ly/3 emission or closer than 5000 kms" 1 blue-wards the Lya emission; e 
number of eliminated lines because one of the three previous conditions occurs; f number of merged lines because their separation is 
less than the Ar threshold indicated in the selection; g number of merged lines not fulfilling the column density selection; " number of 
lines used to compute the TPCF. 



(2) transform the list of column densities of absorbers into 
a list of S with eq. [1] 

(3) bin the redshift range covered by the Lya forest into 
steps of 1 Lj and distribute the absorbers onto this grid, 
proportionally to the superposition between absorber size 
(which is again 1 Lj) and bin. Empty bins are filled with 
one absorber with hydrogen density contrast correspond- 
ing to the minimum detectable column density in our data, 
log N(¥L i) = 12 cm" 2 at the redshift of the bin. 

(4) normalise the resulting S field in order to have (5 + 1) = 
1.0 for the whole considered sample. This operation is nec- 
essary to recover the correct asymptotic behaviour of the 
correlation function. 

With the introduction of this new statistical estimator, 
the drawbacks of the standard Voigt fitting approach are sig- 
nificantly reduced. On the one hand, the statistical weight of 
weak lines is reduced, since their contribution to the 5 field 
is low. On the other hand, we verify that, in the process of 
Voigt fitting complex absorption features, the total Hi col- 
umn density is a much more robust quantity than the num- 
ber of components. To this purpose, we compare the results 



of the line lists of a sub-sample of 12 QSOs adopted in the 
present work with the corresponding lists obtained with the 
VPFIT package, kindly provided to us by Tae-Sun Kim. The 
total number of lines in each line of sight is not conserved, in 
particular, significant differences are observed for the com- 
plex absorption systems where, in general, VPFIT fits more 
lines than FITLYMAN. Most of these discrepancies are due 
to the identification of low column density lines. However, 
the total column density in these complex absorbers appears 
to be much more stable between the two fitting methods. 

In Fig. [9j we plot the comparison between FITLY- 
MAN and VPFIT for one QSO of the sample, Q0109-3518 
(zam — 2.407). We divide the line of sight into redshift bins 
of width Az — 0.01; we sum both the number and the col- 
umn densities of the lines in each bin, and plot them against 
redshift. It is evident that while the number of lines is differ- 
ent, the two column density distributions trace each other 
more faithfully. 

VPFIT has been used to fit the lines of 3 QSOs in our 
sample (see Section 2) and also to analyse the output spec- 
tra from the simulation (see next section). We verify the 
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Figure 9. Comparison between the fitting results by FITLYMAN 
(solid line) and VPFIT (dotted line) for the QSO Q0109-3518. 
The lower panel shows the total number of lines, while the up- 
per one shows the sum of the column densities of all the lines 
in redshift bins of width Az = 0.01. The two redshift intervals 
where the column density measured with FITLYMAN goes to 
zero correspond to masked metal lines falling at those redshifts. 




o 1 

log(<5„m+1) 



Figure 10. Contour scatter plot of the FITLYMAN versus VP- 
FIT reconstructed density fields. The contours show the number 
density of pixels which increases by a factor of 10 at each level. 



stability of FLO against different fitting tools by applying it 
to the line lists obtained with VPFIT and with FITLYMAN 
for the 12 common QSOs. In Fig 1101 we show the compari- 
son between the two recovered fields by means of a contour 
scatter plot. The correlation is tight for all values of 5, the 
scatter increases slightly for 8 < 0. 



4 SIMULATED DATA SAMPLE 

We use simulations run with the parallel hydro-dynamical 
(TreeSPH) code GADGET- 2 based on th e conservative 
'entropy- formulation' of SPH (Springe] 2005). They consist 
of a cosmological volume with periodic boundary conditions 
filled with an equal number of dark matter and gas particles. 
Radiative cooling and heating processes are followed for a 
primordial mix of hydrogen and helium. We assume a mean 
UV-Bac kground (UVB) produce d by QSOs and galaxies as 
given bv lHaardt fc Madaul ([ 19961 ) with helium heating rates 
multiplied by a factor 3.3 in order to better fit observational 
constraints on the temperature evolution of the IGM. This 
background gives naturally aT~ 10~ 12 (H ionisation rate) 
at the redshifts of interest here (jBolton et al ] |2005l ). The 
star formation criterion is a very simple one that converts 
in collision-less stars all the gas particles whose temperature 
falls below 10 5 K and whose density contrast is larger than 
1000 (it has been shown that the star formation criterion 
has a neglig ible impact on flux statistics). More details can 
be found in lViel. Haehnelt. Springell (|2004l ). 

We use 2 x 400 3 dark matter and gas particles in a 
120 h~ x comoving Mpc box (although for some cross-checks 
we analyse some smaller boxes of 60 h~ comoving Mpc). 
The gravitational softening is set to 5 h~ kpc in comoving 
units for all particles. 

We stress that the parameters chosen here, including 
the thermal history of the IGM, are in reasonably good 
agreement with observational constraints including recent 
results on the CMB and other results obtained by the 
Lya forest community (e.g. IViel. Haehnelt. Lewisll2006l ). 

The 120 Mpc simulation box at z — 2 is pierced to cre- 
ate a set of 364 mock lines of sight covering a redshift range 
Az ~ 0.11. For each of these lines of sight, we know the 
density contrast, the temperature, and the peculiar veloc- 
ity pixel by pixel. Peculiar velocities are small, typically less 
than 100 kms -1 , and randomly oriented, so their contribu- 
tion, e.g. to the correlation function, is in general negligi- 
ble. However, since we want to compare the result of sim- 
ulations and observations, we modify the redshifts of the 
density field (z id) with the peculiar velocity field to obtain 
the density field in redshift space (z ne w) using the formula 
v pcc (zoid) = c(z ncw - z ld)/(l + (zncw + £ id)/2). We added 
to the simulated spectra a Gaussian noise S/N=50, in order 
to reproduce the observed average S/N per pixel. The simu- 
lated lines of sight have been fitted with Voigt profiles using 
an automated version of VPFIT. 



4.1 Reconstruction of the S field 

The Lya lines in each simulated line of sight are selected 
to have, as in the case of observations, b ^ 10 kms -1 . We 
introduce a further constraint, b $C 100 kms -1 , which is re- 
quired by the fact that simulated spectra are not continuum 
fitted. Shallow and broad oscillations in simulated spectra 
are fitted as absorption lines with Doppler parameters of the 
order of thousands of kms -1 . In the real spectra these kind 
of oscillations are instead fitted with the continuum and b 
parameters that large are not measured. The selected lines 
are grouped into absorbers and transformed into the corre- 
sponding density field following the procedure described in 
Sectional In eq.[T]the values: T = 1.8 x 10 4 K and a = 0.6 
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l°9(*tru. + 1 ) 

Figure 11. Contour scatter plot of the true versus reconstructed 
8 field from simulations. The contours show the number density 
of pixels which increases by a factor of 10 at each level. 



are adopted, which are consistent with an early re-ionisation 
epoch and are the ones inferred from the simulations. 

The reconstructed field is compared with the original 
density field (i.e. the output of the simulation), which is also 
binned into 1 Lj steps. An upper threshold is adopted both 
for the true and the recovered 8 field, S t hi = 50, since 99.95 
percent of pixels in the simulated lines of sight have values 
8 ^ Sthi an< i the algorithm to recover the 8 field (eq. [TJ is 
valid for values of 8 few x 10. The upper cut is applied 
before the normalisation process. 

The average values of the 8 field considering all the 364 
simulated spectra are: {5+1} ~ 0.9 and 1.3 for the true and 
recovered field, respectively. The fields are normalised using 
these values. 

Figure [TT] shows the contour scatter plot of the original 
versus reconstructed density field. As can be seen from the 
figure, FLO reconstructs fairly well the original field above 
the mean density, while under-densities are underestimated, 
or not recovered, if they are below our lower threshold. In- 
deed, the lower horizontal tail observed in the scatter plot is 
due to the treatment of the empty bins during the absorber- 
field transformation. The upper horizontal tail is instead due 
to the cut applied to over-densities larger than the threshold, 
<5th r . Figure [121 shows the distribution of 8 values in the true 
and recovered field. The peak at log(5 + l) ~ —0.83 contains 
~ 53 percent of all the points and it is due to the procedure 
that assigns to empty bins the value of 8 corresponding to 
the redshift of the bin and to the minimum observed column 
density, log iV(Hl) = 12 cm -2 . On the other hand, the small 
bump at log(<5 + 1) — 1-57 is due to the upper cut applied 
to the recovered density field and it includes ~ 0.4 percent 
of the total number of points. 

The transformation starts to recover more than half of 
the correct values of 8 at log(5 + 1) ~ —0.15 and recovers 
all the 8 within 30 percent in the range —0.08 < log(<5 + 
1) < 1.45. Clearly, we are not dealing correctly with the 
under-dense regions, even if they are above our observational 
detection limit. This is likely due to the fact that our primary 
hypothesis, the local hydrostatic equilibrium, is not valid for 
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O 
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Figure 12. Distribution of 5 values in the true (solid line) and 
recovered (dashed line) field normalised to the total number of 
points. 



those regions. This was also discussed bv lSchavel l|200ll ) and 
here we have the evidence that under the mean density the 
gas is still expanding. 

In the next section, it will be shown that the inaccura- 
cies in the under-density regime do not significantly affect 
statistical measures like the correlation function. 



4.2 Two-point statistics of the 8 field 

We computed the correlation function for the original and 
recovered 8 field, with the formula: 



&(r) = (S(r + dr)S(r)}, 



(4) 



where r is the physical separation of two points in comoving 
Mpc. £d{r) quantifies the clustering properties of the consid- 
ered field, showing a signal significantly different from zero 
at separations where the field presents structures (over or 
under-densities) . 

The bin size is the largest value of Lj for our sample, 
~ 1.532 comoving Mpc, corresponding to the minimum red- 
shift. 

Figure[l3]plots the results of the correlation function for 
the true and recovered 8 field. The value in each bin is the 
median value of 50 sample of 88 lines of sight obtained with 
a bootstrap technique from the 364 lines of sight of the total 
sample. This procedure is required in order to compare this 
result with the analogous one for the observed data (see Sec- 
tion [5TTJ . We have 22 observed spectra but each one covers 
a redshift range corresponding to about 4 simulated spec- 
tra. Error-bars are 1 <r, computed with the percentiles of the 
distribution of values in each bin. The recovered correlation 
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Figure 13. Correlation function of 8 from simulations. Crosses 
represent the correlation function obtained from the original data, 
triangles the one obtained from the reconstructed field. 




Figure 14. One dimensional power spectrum of the hydrogen 
density contrast field. The solid and the dashed lines represent the 
results for the reconstructed and the original 8 field, respectively. 



function is in very good agreement with the true one at every 
separation. 

We also estimate the one dimensional power spectrum 
of the hydrogen density contrast field, that is defined by the 
Fourier transform of the correlation function: 



2tt 



1-5 



k e 



r dk 



(5) 



P}°(k) EE <|«5 fc | 2 ) 

The power spectrum is computed adopting the Fast Fourier 
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Figure 15. Distribution with redshift of the average recon- 
structed 5 values for 4 sub-samples of QSOs selected by their 
emission redshift as displayed in the plot (formed by 6,6,5 and 5 
objects as redshift increases). Horizontal lines represent the red- 
shift coverage of each QSO sample, while vertical lines are the 
spread in average <5 values for the single QSOs in the samples. 



Transform (FFT) technique which requires that the field to 
be transformed is evenly sampled. To this purpose, we have 
re-binned the observed lines of sight to a constant step equal 
to the minimum Jeans length for the considered Lya forest 
(corresponding to the maximum redshift). Then, the follow- 
ing steps have been applied: 

1) a grid of wave-numbers is built in the Fourier space, start- 
ing from fc m i n = 2n/ A r, where A r is the length of a line of 
sight in comoving Mpc, and formed by n p i x /2 evenly spaced 
elements, where n p i x is the number of pixels of the original 
5 field; 

2) the Fourier transform of the 8 field is computed; 

3) the products 8^8 ,i are averaged in each bin; 

4) the Pj D {k) is normalised by multiplying it for the line of 
sight length, Ar; 

5) the obtained Pj 1D (fc) is smoothed on larger bins to reduce 
the noise; 

6) the result is averaged over all the lines of sight. 

Error bars ar e computed using a jackknife estimator 
l|Bradlevi ri982) on the whole sample of simulated lines of 
sight. 

Fig- 03] shows the results of the computation of the cor- 
responding A 2 (k) = kP} u (k) /2-7T for the true and recovered 
simulated 8 fields. The two power spectra are consistent at 
the 3ct level on scales 20 < r < 60 comoving Mpc. 
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FLO APPLIED TO THE OBSERVED DATA 
SAMPLE 



In Section 12. II we have described how the line lists are com- 
piled for the 22 high resolution QSO spectra forming our 
sample. The procedure explained in Section [3] is then ap- 
plied to obtain the corresponding density contrast field for 
each line of sight. In the case of observations, we have to take 
into account the presence of the masked intervals covering 
regions occupied by metal absorption systems. We eliminate 
all the bins that are covered by more than 30% by masked 
intervals. Before the normalisation step, we apply an upper 
threshold as in the case of simulations (<5thr = 50) since we 
want to compare our result with the one obtained in Sec- 
tion 4.2. The pixels above the threshold correspond to ~ 0.9 
percent of the total number of pixels. Figures \TU\ and [TT] 
show the uncertainties associated with the use of a differ- 
ent fitting tool (in particular, VPFIT and FITLYMAN) and 
the ones intrinsic to FLO, respectively. Since the intrinsic er- 
rors turn out to be larger than those induced by the fitting 
technique, we can safely compare the results obtained from 
the simulations and those obtained from the data sample, 
presented in this section. 

Figure [15] shows the average values of the recovered S 
fields for 4 sub-samples built from the 22 observed lines of 
sight selected on the ground of the QSO emission redshifts. 
The spread of average 8 values for the single QSOs form- 
ing the samples is also shown. There is no significant trend 
with redshift, as it is expected if the density field follows on 
average the evolution of the cosmic mean value. 



5.1 Two point statistics of the S field 

The correlation function for the observed 5 field is computed 
with the formula given in eq. 2] as for simulations. 

The result is shown in Fig. [16] Here the value in each 
bin is obtained averaging all the sample, while the error 
bars are computed creating 50 samples of 22 lines of sight 
drawn out of our sample with a bootstrap technique and 
taking the percentiles of the distribution corresponding to 
la errors. The bin size is ~ 1.6 comoving Mpc which is 
the Lj at the lowest redshift of the sample. The clustering 
signal is significant at more than the 3 a level in the first 
two bins (r < 4.5 comoving Mpc). We have superimposed 
to the data points the TPCF obtained from the recovered S 
field of simulations. The two correlation functions are in very 
good agreement, confirming the validity of the cosmological 
parameters adopted in the simulation. 

Since P/ D (fe) is very sensitive to cosmological parame- 
ters, it is very important to check if the prediction of such 
a function are in agreement with the observed values. 

In the case of the observed spectra, the masked metal 
lines make the starting grid of pixels unevenly spaced, thus 
not fitted for the application of the FFT. To overcome this 
problem, as a 1st order approximation, the masked bins have 
been put to the average density. This procedure is based on 
the observation that the Lya forest gas traces on average the 
average density and on the analogous method adopted in the 
computation of the power spec trum of the transmitted flux 
i|Viel. Haehnelt. Springeill2004h . Fig. [T71 shows the result of 
this computation: the power spectrum obtained from our 
data is in excellent agreement with the one obtained from 
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Figure 16. Correlation function of the <5 field reconstructed from 
our 22 observed QSO spectra. Points refer to the data, the line 
instead represent the prediction from the simulation. 
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Figure 17. Power spectrum of the <5 field reconstructed from 
our 22 observed QSO spectra. The dashed line represents the 
prediction from the simulation. 



the density fields reconstructed with FLO from the simu- 
lated spectra based on a concordance cosmological model. 
Error bars are computed using a jackknife estimator on the 
whole sample of observed QSOs. The quantity which is gen- 
erally compared with the model predictions is the 3D power 
spectrum, which is obtained from the ID by differentiation, 



A comparison of the rough estimates 



of Pi (k) from our observed and simulated data gives good 
agreement. However, we postpone a careful study of this 
quantity obtaining constraints on the cosmlogical parame- 
ters to a forthcoming paper. 
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6 CONCLUSIONS 

We have presented results from the analysis of the largest 
sample of fitted Lya lines obtained from 22 high resolution 
QSO spectra covering the redshift range between ~ 1.7 and 
3.5. 

In particular, we have computed: 

La) the line number density evolution with redshift: for 
which we find dn/dz ~ (166 ± 4)[(1 + z)/3.5] 2 ' 8±0 ' 2 . While 
the redshift evolution is consistent with previous results, the 
normalisation is higher by a factor ranging from ~ 0.03 in 
log(dn/dz) at z ~ 2 to 0.1 at z ~ 3. This difference is due to 
the improved treatment of the contamination by metal lines 
(amounting to ~ 9 percent of the redshift interval covered 
by the Lya forests), which is made possible by the high res- 
oluti on and signal-to-n oise ratio of our spectra. Consistently 
with iKim et alj l|2002h . we also find a steeper evolution for 
the stronger lines (14.5 ^ logiV(Hl) < 17) compared to the 
weak ones (13 < logjV(Hl) < 14). 

Lb) the two-point correlation function (TPCF): which 
shows a significant clustering signal up to ~ 2 comoving Mpc 
for strong lines (13.8 ^ logiV(Hi) < 17), and also for weak 
lines (12 logiV(Hl) < 13.8) although on smaller scales, 
< 1.5 Mpc. We then calculated the TPCF by grouping all 
the lines closer than the local Jeans length (the assumed 
typical size for the hydrogen absorbers in the IGM). The 
signal is still significant in the first bin (r < 2.5 Mpc). 

1. c) the TPCF evolution with redshift for strong lines: we 
divided our sample in two sub-samples; the first one, formed 
by objects with z om < 2.5, for which the average Lya for- 
est redshift is (zLy a ) = 2.07, and the second one, formed 
by objects with z cm > 2.5, with (zt,y a ) = 3.02. The TPCF 
computed for lines with 13.8 < logiV(Hl) < 17 in these two 
samples, show a trend of increasing clustering with decreas- 
ing redshift; this is an apparent evolution, due to the fact 
that the relation 5 — log(iV(Hl)) is z-dependent. Indeed a 
selection of lines tracing the same kind of structures (charac- 
terised by S > 3) shows no evidence of a significant evolution 
with redshift of the TPCF. 

In the second part of the paper, we have described FLO, 
a new algorithm to transform the measured H I column den- 
sities of the Lya lines detected along a line of sight, into the 
underlying total H density field (and in particular, the den- 
sity contrast, S = nn/(nn) — 1, field). The method is based 
on the assumption that Lya absorbers are in local hydro- 
static equilibrium and, as a consequence, the Jeans length 
corresponds to their characteristic size. The aim of this study 
is to find a robust statistical estimator which allows a direct 
link to the physical properties of the gas and an easy com- 
parison with the results of simulations. To test the effects of 
the transformation, we have used a set of 364 lines of sight 
obtained from a large N-body hydro-dynamical simulation 
run in a box of 120/t -1 comoving Mpc. For every line of sight 
we have both the density and velocity field pixel per pixel 
and the list of Voigt fitted Lya lines with central redshift, 
column density and Doppler parameter. Our results can be 
summarised as follows: 

2. a) FLO recovers extremely well (within 30 percent) the 
over-densities up to S ~ 30 while it is not reproducing cor- 
rectly the under-densities (more than 50 percent of 5 values 
are not recovered) even in the range above our resolution 



limit. This result suggests that the hypothesis of hydrostatic 
equilibrium is not valid for the under-dense regions that are 
likely still expanding. On the other hand, for the goal of our 
study, that is the computation of statistical properties of the 
IGM, the resulting 8 field gives satisfactory results when the 
two-point correlation function and the ID power spectrum 
are considered. The comparison of the results obtained with 
the true S field of the simulation and with the one recon- 
structed from line column densities with our algorithm, are 
in very good agreement. 

When applied to the observed data sample, the FLO tech- 
nique gives the following results: 

2.b) the redshift distribution of the average hydrogen den- 
sity is consistent with the evolution of the cosmic mean hy- 
drogen density in the redshift range covered by our QSO 
sample, supporting the fact that the Lya forest arises from 
fluctuations of the IGM close to the mean density; 
2.c) the correlation function of the density field obtained 
from the observed spectra shows a significant clustering sig- 
nal up to ~ 4 comoving Mpc and is consistent with the 
analogous result obtained for the recovered density field in 
a simulation based on the concordance cosmological model. 
2.d) the one dimensional power spectrum of the S field ob- 
tained from the observed spectra is in very good agreement 
with the same result obtained from the recovered density 
field from the simulation based on the concordance cosmo- 
logical model on scale lengths between ~ 2.5 and 63 comov- 
ing Mpc. 

The algorithm presented in this work is particularly 
useful to extract information from observations in terms of 
overdensities, making it possible a more direct and handy 
comparison with simulations. 
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